<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
                      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
    <title>Lucene Index Structure - Zend Framework Manual</title>

    <link href="../css/shCore.css" rel="stylesheet" type="text/css" />
    <link href="../css/shThemeDefault.css" rel="stylesheet" type="text/css" />
    <link href="../css/styles.css" media="all" rel="stylesheet" type="text/css" />
</head>
<body>
<h1>Zend Framework</h1>
<h2>Programmer's Reference Guide</h2>
<ul>
    <li><a href="../en/learning.lucene.index-structure.html">Inglês (English)</a></li>
    <li><a href="../pt-br/learning.lucene.index-structure.html">Português Brasileiro (Brazilian Portuguese)</a></li>
</ul>
<table width="100%">
    <tr valign="top">
        <td width="85%">
            <table width="100%">
                <tr>
                    <td width="25%" style="text-align: left;">
                    <a href="learning.lucene.intro.html">Zend_Search_Lucene Introduction</a>
                    </td>

                    <td width="50%" style="text-align: center;">
                        <div class="up"><span class="up"><a href="learning.lucene.html">Iniciando com o Zend_Search_Lucene</a></span><br />
                        <span class="home"><a href="manual.html">Guia de Refer&ecirc;ncia do Programador</a></span></div>
                    </td>

                    <td width="25%" style="text-align: right;">
                        <div class="next" style="text-align: right; float: right;"><a href="learning.lucene.index-opening.html">Index Opening and Creation</a></div>
                    </td>
                </tr>
            </table>
<hr />
<div id="learning.lucene.index-structure" class="section"><div class="info"><h1 class="title">Lucene Index Structure</h1></div>
    

    <p class="para">
        In order to fully utilize <span class="classname">Zend_Search_Lucene</span>&#039;s capabilities with
        maximum performance, you need to understand it&#039;s internal index structure.
    </p>

    <p class="para">
        An <em class="emphasis">index</em> is stored as a set of files within a single directory.
    </p>

    <p class="para">
        An <em class="emphasis">index</em> consists of any number of independent
        <em class="emphasis">segments</em> which store information about a subset of indexed documents.
        Each <em class="emphasis">segment</em> has its own <em class="emphasis">terms dictionary</em>, terms
        dictionary index, and document storage (stored field values) <a href="#fnid1" name="fn1"><sup>[1]</sup></a><span class="classname">Zend_Search_Lucene</span>. All segment data is stored in
        <var class="filename">_xxxxx.cfs</var> files, where <em class="emphasis">xxxxx</em> is a segment name.
    </p>

    <p class="para">
        Once an index segment file is created, it can&#039;t be updated. New documents are added to new
        segments. Deleted documents are only marked as deleted in an optional
        <var class="filename">&lt;segmentname&gt;.del</var> file.
    </p>

    <p class="para">
        Document updating is performed as separate delete and add operations, even though it&#039;s done
        using an  <span class="methodname">update()</span> <acronym class="acronym">API</acronym> call
        <a href="#fnid2" name="fn2"><sup>[2]</sup></a><span class="classname">Zend_Search_Lucene</span> <acronym class="acronym">API</acronym>.
        This simplifies adding new documents, and allows updating concurrently with search
        operations.
    </p>

    <p class="para">
        On the other hand, using several segments (one document per segment as a borderline case)
        increases search time:
    </p>

    <ul class="itemizedlist">
        <li class="listitem">
            <p class="para">
                retrieving a term from a dictionary is performed for each segment;
            </p>
        </li>

        <li class="listitem">
            <p class="para">
                the terms dictionary index is pre-loaded for each segment (this process takes the
                most search time for simple queries, and it also requires additional memory).
            </p>
        </li>
    </ul>

    <p class="para">
        If the terms dictionary reaches a saturation point, then search through one segment is
        <em class="emphasis">N</em> times faster than search through <em class="emphasis">N</em> segments
        in most cases.
    </p>

    <p class="para">
        <em class="emphasis">Index optimization</em> merges two or more segments into a single new one. A
        new segment is added to the index segments list, and old segments are excluded.
    </p>

    <p class="para">
        Segment list updates are performed as an atomic operation. This gives the ability of
        concurrently adding new documents, performing index optimization, and searching through the
        index.
    </p>

    <p class="para">
        Index auto-optimization is performed after each new segment generation. It merges sets of
        the smallest segments into larger segments, and larger segments into even larger segments,
        if we have enough segments to merge.
    </p>

    <p class="para">
        Index auto-optimization is controlled by three options:
    </p>

    <ul class="itemizedlist">
        <li class="listitem">
            <p class="para">
                <em class="emphasis">MaxBufferedDocs</em> (the minimal number of documents required
                before the buffered in-memory documents are written into a new segment);
            </p>
        </li>

        <li class="listitem">
            <p class="para">
                <em class="emphasis">MaxMergeDocs</em> (the largest number of documents ever merged by
                an optimization operation); and
            </p>
        </li>

        <li class="listitem">
            <p class="para">
                <em class="emphasis">MergeFactor</em> (which determines how often segment indices are
                merged by auto-optimization operations).
            </p>
        </li>
    </ul>

    <p class="para">
        If we add one document per script execution, then <em class="emphasis">MaxBufferedDocs</em> is
        actually not used (only one new segment with only one document is created at the end of
        script execution, at which time the auto-optimization process starts).
    </p>
<div class="footnote"><a name="fnid1" href="#fn1"><sup>[1]</sup></a><span class="para footnote">Starting with
                Lucene 2.3, document storage files can be shared between segments; however,
                 doesn't use this
                capability</span></div>
<div class="footnote"><a name="fnid2" href="#fn2"><sup>[2]</sup></a><span class="para footnote">This call is provided only by Java Lucene now, but it's planned to extend
            the  with similar
            functionality</span></div>
</div>
        <hr />

            <table width="100%">
                <tr>
                    <td width="25%" style="text-align: left;">
                    <a href="learning.lucene.intro.html">Zend_Search_Lucene Introduction</a>
                    </td>

                    <td width="50%" style="text-align: center;">
                        <div class="up"><span class="up"><a href="learning.lucene.html">Iniciando com o Zend_Search_Lucene</a></span><br />
                        <span class="home"><a href="manual.html">Guia de Refer&ecirc;ncia do Programador</a></span></div>
                    </td>

                    <td width="25%" style="text-align: right;">
                        <div class="next" style="text-align: right; float: right;"><a href="learning.lucene.index-opening.html">Index Opening and Creation</a></div>
                    </td>
                </tr>
            </table>
</td>
        <td style="font-size: smaller;" width="15%"> <style type="text/css">
#leftbar {
	float: left;
	width: 186px;
	padding: 5px;
	font-size: smaller;
}
ul.toc {
	margin: 0px 5px 5px 5px;
	padding: 0px;
}
ul.toc li {
	font-size: 85%;
	margin: 1px 0 1px 1px;
	padding: 1px 0 1px 11px;
	list-style-type: none;
	background-repeat: no-repeat;
	background-position: center left;
}
ul.toc li.header {
	font-size: 115%;
	padding: 5px 0px 5px 11px;
	border-bottom: 1px solid #cccccc;
	margin-bottom: 5px;
}
ul.toc li.active {
	font-weight: bold;
}
ul.toc li a {
	text-decoration: none;
}
ul.toc li a:hover {
	text-decoration: underline;
}
</style>
 <ul class="toc">
  <li class="header home"><a href="manual.html">Guia de Refer&ecirc;ncia do Programador</a></li>
  <li class="header up"><a href="manual.html">Guia de Refer&ecirc;ncia do Programador</a></li>
  <li class="header up"><a href="learning.html">Conhecendo o Zend Framework</a></li>
  <li class="header up"><a href="learning.lucene.html">Iniciando com o Zend_Search_Lucene</a></li>
  <li><a href="learning.lucene.intro.html">Zend_Search_Lucene Introduction</a></li>
  <li class="active"><a href="learning.lucene.index-structure.html">Lucene Index Structure</a></li>
  <li><a href="learning.lucene.index-opening.html">Index Opening and Creation</a></li>
  <li><a href="learning.lucene.indexing.html">Indexing</a></li>
  <li><a href="learning.lucene.searching.html">Searching</a></li>
  <li><a href="learning.lucene.queries.html">Supported queries</a></li>
  <li><a href="learning.lucene.pagination.html">Search result pagination</a></li>
 </ul>
 </td>
    </tr>
</table>

<script type="text/javascript" src="../js/shCore.js"></script>
<script type="text/javascript" src="../js/shAutoloader.js"></script>
<script type="text/javascript" src="../js/main.js"></script>

</body>
</html>